Cross-genre Gender Identification in Russian Texts Using Topic Modeling Working Note: Team DUBL

نویسندگان

  • Gabriella Skitalinskaya
  • Liliya Akhtyamova
  • John Cardiff
چکیده

In this paper, we describe the results of gender identification from Team DUBL. We used a topic modeling approach for identifying the author’s gender based on his/her written texts. The model was trained on the RusProfiling PAN 2017 Twitter Corpus that contains data in the Russian language. Themodel has been evaluated on texts of other genres, including texts such as letters to a friend, online reviews, Facebook posts and etc. Our model has obtained competitive results and has been shown to outperform more sophisticated algorithms on gender identification.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Winning Approach to Cross-Genre Gender Identification in Russian at RUSProfiling 2017

We present the CIC systems submitted to the 2017 PAN shared task on Cross-Genre Gender Identification in Russian texts (RUSProfiling). We submitted five systems. One of them was based on a statistical approach using only lexical features, and other four on machine-learning techniques using some combinations of genderspecific Russian grammatical features, word and character n-grams, and suffix n...

متن کامل

Overview of the RUSProfiling PAN at FIRE Track on Cross-genre Gender Identification in Russian

Author profiling consists of predicting some author’s traits (e.g. age, gender, personality) from her writing. After addressing at PAN@CLEF mainly age and gender identification, in this RusProfiling PAN@FIRE track we have addressed the problem of predicting author’s gender in Russian from a cross-genre perspective: given a training set on Twitter, the systems have been evaluated on five differe...

متن کامل

Gender Prediction for Authors of Russian Texts Using Regression And Classification Techniques

Automatic extraction of information about authors of texts (gender, age, psychological type, etc.) based on the analysis of linguistic parameters has gained a particular significance as there are more online texts whose authors either avoid providing any personal data or make it intentionally deceptive despite of it being of practical importance in marketing, forensics, sociology. These studies...

متن کامل

Applying Multi-Dimensional Analysis to a Russian Webcorpus: Searching for Evidence of Genres

The paper presents an application of Multidimensional (MD) analysis initially developed for the analysis of register variation in English (Biber, 1988) to the investigation of a genre diverse corpus, which was built from modern texts of the Russian Web. The analysis is based on the idea that each linguistic feature has different frequencies in different registers, and statistically stable co-oc...

متن کامل

Overview of the PAN/CLEF 2015 Evaluation Lab

This paper presents an overview of the PAN/CLEF evaluation lab. During the last decade, PAN has been established as the main forum of text mining research focusing on the identification of personal traits of authors left behind in texts unintentionally. PAN 2015 comprises three tasks: plagiarism detection, author identification and author profiling studying important variations of these problem...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017